Training Acoustic Models with Speech Data from Different Languages

نویسندگان

  • Chen Liu
  • Lynette Melnar
چکیده

We present a technique to train acoustic models for a target language using speech data from distinct source languages. In this approach, no native training data from the target language is required. The acoustic model candidates for each targetlanguage phoneme are automatically selected from a group of existing source languages by means of a combined phoneticphonological (CPP) metric, developed by incorporating statistically-derived phonetic and phonological distance information (Liu and Melnar, Interspeech 2005). The method assumes availability of sufficient native training data for the source languages and pronunciation lexica for both the target and source languages. Once the model candidates are determined for each target-language phoneme, the target HMMs are trained with the speech data from the source languages by means of a “silkie-hen-on-duck-eggs” strategy – namely the target phoneme model training is embedded in the source phoneme model training. The recognition performance of the resultant models is comparable to that of our previously-reported CPP-derived models built through multimixture construction while the size of the current models is only a fraction of the previous models, depending on the number of HMM candidates used for each target phoneme. Utilizing the CPP metric, both versions of the models reach the performance of models generated by a data-driven acoustic-distance mapping approach, far above the general phoneme symbol-based cross-language transfer strategies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cheap Bootstrap of Multi-Lingual Hidden Markov Models

In this work we investigate the usage of TV audio data for cross-language training of multi-lingual acoustic models. We intend to take advantage from the availability of a training speech corpus formed by parallel news uttered in different languages and transmitted over separated audio channels. Spanish, French and Russian phone Hidden Markov Models (HMMs) are bootstrapped using an unsupervised...

متن کامل

Improved Multilingual Training of Stacked Neural Network Acoustic Models for Low Resource Languages

This paper proposes several improvements to multilingual training of neural network acoustic models for speech recognition and keyword spotting in the context of low-resource languages. We concentrate on the stacked architecture where the first network is used as a bottleneck feature extractor and the second network as the acoustic model. We propose to improve multilingual training when the amo...

متن کامل

Language independent and unsupervised acoustic models for speech recognition and keyword spotting

Developing high-performance speech processing systems for low-resource languages is very challenging. One approach to address the lack of resources is to make use of data from multiple languages. A popular direction in recent years is to train a multi-language bottleneck DNN. Language dependent and/or multi-language (all training languages) Tandem acoustic models are then trained. This work con...

متن کامل

Likelihood Probability Mismatch Analysis and Normalization in Multilingual Speech Applications

In this paper, with a multilingual speech recognition system, we exam the HMM likelihood scores among the different acoustic models and observe that there exist scoring mismatches. The mismatches might come from different recording environments in which the training data for each language were collected, or come from different acoustic modeling structures. This analysis helps us understand the ...

متن کامل

Different size multilingual phone inventories and context-dependent acoustic models for language identification

Experimental work using phonotactic and syllabotactic approaches for automatic language identification (LID) is presented. Various questions have originated this research: what is the best choice for a multilingual phone inventory? Can a syllabic unit be of interest to extend the scope of the modeling unit? Are context-dependent (CD) acoustic models, widely used for speech recognition, able to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005